Goto

Collaborating Authors

 bipartite network


Common Structure Discovery in Collections of Bipartite Networks: Application to Pollination Systems

Lacoste, Louis, Barbillon, Pierre, Donnet, Sophie

arXiv.org Machine Learning

Bipartite networks are widely used to encode the ecological interactions. Being able to compare the organization of bipartite networks is a first step toward a better understanding of how environmental factors shape community structure and resilience. Yet current methods for structure detection in bipartite networks overlook shared patterns across collections of networks. We introduce the \emph{colBiSBM}, a family of probabilistic models for collections of bipartite networks that extends the classical Latent Block Model (LBM). The proposed framework assumes that networks are independent realizations of a shared mesoscale structure, encoded through common inter-block connectivity parameters. We establish identifiability conditions for the different variants of \emph{colBiSBM} and develop a variational EM algorithm for parameter estimation, coupled with an adaptation of the Integrated Classification Likelihood (ICL) criterion for model selection. We demonstrate how our approach can be used to classify networks based on their topology or organization. Simulation studies highlight the ability of \emph{colBiSBM} to recover common structures, improve clustering performance, and enhance link prediction by borrowing strength across networks. An application to plant--pollinator networks highlights how the method uncovers shared ecological roles and partitions networks into sub-collections with similar connectivity patterns. These results illustrate the methodological and practical advantages of joint modeling over separate network analyses in the study of bipartite systems.


Variational Estimators for Node Popularity Models

Karki, Jony, Huang, Dongzhou, Zhao, Yunpeng

arXiv.org Machine Learning

Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity Model (TNPM). Existing methods, such as the Two-Stage Divided Cosine (TSDC) algorithm, provide a scalable estimation approach but may have limitations in terms of accuracy or applicability across different types of networks. In this paper, we develop a computationally efficient and theoretically justified variational expectation-maximization (VEM) framework for the TNPM. We establish label consistency for the estimated community assignments produced by the proposed variational estimator in bipartite networks. Through extensive simulation studies, we show that our method achieves superior estimation accuracy across a range of bipartite as well as undirected networks compared to existing algorithms. Finally, we evaluate our method on real-world bipartite and undirected networks, further demonstrating its practical effectiveness and robustness.


Involvement drives complexity of language in online debates

Amadori, Eleonora, Cirulli, Daniele, Di Martino, Edoardo, Nudo, Jacopo, Sahakyan, Maria, Sangiorgio, Emanuele, Santoro, Arnaldo, Zollo, Simon, Galeazzi, Alessandro, Di Marco, Niccolò

arXiv.org Artificial Intelligence

Language is a fundamental aspect of human societies, continuously evolving in response to various stimuli, including societal changes and intercultural interactions. Technological advancements have profoundly transformed communication, with social media emerging as a pivotal force that merges entertainment-driven content with complex social dynamics. As these platforms reshape public discourse, analyzing the linguistic features of user-generated content is essential to understanding their broader societal impact. In this paper, we examine the linguistic complexity of content produced by influential users on Twitter across three globally significant and contested topics: COVID-19, COP26, and the Russia-Ukraine war. By combining multiple measures of textual complexity, we assess how language use varies along four key dimensions: account type, political leaning, content reliability, and sentiment. Our analysis reveals significant differences across all four axes, including variations in language complexity between individuals and organizations, between profiles with sided versus moderate political views, and between those associated with higher versus lower reliability scores. Additionally, profiles producing more negative and offensive content tend to use more complex language, with users sharing similar political stances and reliability levels converging toward a common jargon. Our findings offer new insights into the sociolinguistic dynamics of digital platforms and contribute to a deeper understanding of how language reflects ideological and social structures in online spaces.


Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs

Cai, Erica, O'Connor, Brendan

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) are useful for analyzing social structures, community dynamics, institutional memberships, and other complex relationships across domains from sociology to public health. While recent advances in large language models (LLMs) have improved the scalability and accessibility of automated KG extraction from large text corpora, the impacts of extraction errors on downstream analyses are poorly understood, especially for applied scientists who depend on accurate KGs for real-world insights. To address this gap, we conducted the first evaluation of KG extraction performance at two levels: (1) micro-level edge accuracy, which is consistent with standard NLP evaluations, and manual identification of common error sources; (2) macro-level graph metrics that assess structural properties such as community detection and connectivity, which are relevant to real-world applications. Focusing on affiliation graphs of person membership in organizations extracted from social register books, our study identifies a range of extraction performance where biases across most downstream graph analysis metrics are near zero. However, as extraction performance declines, we find that many metrics exhibit increasingly pronounced biases, with each metric tending toward a consistent direction of either over- or under-estimation. Through simulations, we further show that error models commonly used in the literature do not capture these bias patterns, indicating the need for more realistic error models for KG extraction. Our findings provide actionable insights for practitioners and underscores the importance of advancing extraction methods and error modeling to ensure reliable and meaningful downstream analyses.


BicliqueEncoder: An Efficient Method for Link Prediction in Bipartite Networks using Formal Concept Analysis and Transformer Encoder

Yang, Hongyuan, Peng, Siqi, Yamamoto, Akihiro

arXiv.org Artificial Intelligence

We propose a novel and efficient method for link prediction in bipartite networks, using \textit{formal concept analysis} (FCA) and the Transformer encoder. Link prediction in bipartite networks finds practical applications in various domains such as product recommendation in online sales, and prediction of chemical-disease interaction in medical science. Since for link prediction, the topological structure of a network contains valuable information, many approaches focus on extracting structural features and then utilizing them for link prediction. Bi-cliques, as a type of structural feature of bipartite graphs, can be utilized for link prediction. Although several link prediction methods utilizing bi-cliques have been proposed and perform well in rather small datasets, all of them face challenges with scalability when dealing with large datasets since they demand substantial computational resources. This limits the practical utility of these approaches in real-world applications. To overcome the limitation, we introduce a novel approach employing iceberg concept lattices and the Transformer encoder. Our method requires fewer computational resources, making it suitable for large-scale datasets while maintaining high prediction performance. We conduct experiments on five large real-world datasets that exceed the capacity of previous bi-clique-based approaches to demonstrate the efficacy of our method. Additionally, we perform supplementary experiments on five small datasets to compare with the previous bi-clique-based methods for bipartite link prediction and demonstrate that our method is more efficient than the previous ones.


Two-way Node Popularity Model for Directed and Bipartite Networks

Jing, Bing-Yi, Li, Ting, Wang, Jiangzhou, Wang, Ya

arXiv.org Machine Learning

There has been extensive research on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from different distributions within a general sub-Gaussian family. We introduce the Delete-One-Method (DOM) for model fitting and community structure identification, and provide a comprehensive theoretical analysis with novel technical skills dealing with sub-Gaussian generalization. Additionally, we propose the Two-Stage Divided Cosine Algorithm (TSDC) to handle large-scale networks more efficiently. Our proposed methods offer multi-folded advantages in terms of estimation accuracy and computational efficiency, as demonstrated through extensive numerical studies. We apply our methods to two real-world applications, uncovering interesting findings.


Link Prediction in Bipartite Networks

Özer, Şükrü Demir İnan, Orman, Günce Keziban, Labatut, Vincent

arXiv.org Artificial Intelligence

Bipartite networks serve as highly suitable models to represent systems involving interactions between two distinct types of entities, such as online dating platforms, job search services, or ecommerce websites. These models can be leveraged to tackle a number of tasks, including link prediction among the most useful ones, especially to design recommendation systems. However, if this task has garnered much interest when conducted on unipartite (i.e. standard) networks, it is far from being the case for bipartite ones. In this study, we address this gap by performing an experimental comparison of 19 link prediction methods able to handle bipartite graphs. Some come directly from the literature, and some are adapted by us from techniques originally designed for unipartite networks. We also propose to repurpose recommendation systems based on graph convolutional networks (GCN) as a novel link prediction solution for bipartite networks. To conduct our experiments, we constitute a benchmark of 3 real-world bipartite network datasets with various topologies. Our results indicate that GCN-based personalized recommendation systems, which have received significant attention in recent years, can produce successful results for link prediction in bipartite networks. Furthermore, purely heuristic metrics that do not rely on any learning process, like the Structural Perturbation Method (SPM), can also achieve success.


Bipartite Graph Variational Auto-Encoder with Fair Latent Representation to Account for Sampling Bias in Ecological Networks

Anakok, Emre, Barbillon, Pierre, Fontaine, Colin, Thebault, Elisa

arXiv.org Machine Learning

We propose a method to represent bipartite networks using graph embeddings tailored to tackle the challenges of studying ecological networks, such as the ones linking plants and pollinators, where many covariates need to be accounted for, in particular to control for sampling bias. We adapt the variational graph auto-encoder approach to the bipartite case, which enables us to generate embeddings in a latent space where the two sets of nodes are positioned based on their probability of connection. We translate the fairness framework commonly considered in sociology in order to address sampling bias in ecology. By incorporating the Hilbert-Schmidt independence criterion (HSIC) as an additional penalty term in the loss we optimize, we ensure that the structure of the latent space is independent of continuous variables, which are related to the sampling process. Finally, we show how our approach can change our understanding of ecological networks when applied to the Spipoll data set, a citizen science monitoring program of plant-pollinator interactions to which many observers contribute, making it prone to sampling bias.


BERT4FCA: A Method for Bipartite Link Prediction using Formal Concept Analysis and BERT

Peng, Siqi, Yang, Hongyuan, Yamamoto, Akihiro

arXiv.org Artificial Intelligence

We propose BERT4FCA, a novel method for link prediction in bipartite networks, using formal concept analysis (FCA) and BERT. Link prediction in bipartite networks is an important task that can solve various practical problems like friend recommendation in social networks and co-authorship prediction in author-paper networks. Recent research has found that in bipartite networks, maximal bi-cliques provide important information for link prediction, and they can be extracted by FCA. Some FCA-based bipartite link prediction methods have achieved good performance. However, we figured out that their performance could be further improved because these methods did not fully capture the rich information of the extracted maximal bi-cliques. To address this limitation, we propose an approach using BERT, which can learn more information from the maximal bi-cliques extracted by FCA and use them to make link prediction. We conduct experiments on three real-world bipartite networks and demonstrate that our method outperforms previous FCA-based methods, and some classic methods such as matrix-factorization and node2vec.


Efficient High-Quality Clustering for Large Bipartite Graphs

Yang, Renchi, Shi, Jieming

arXiv.org Artificial Intelligence

A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs. HOPE obtains high scalability and effectiveness through a new k-BGC problem formulation based on the novel notion of high-order perspective (HOP) vectors and an efficient technique for low-rank approximation of HOP vectors. HOPE+ further elevates the k-BGC performance to another level with a judicious problem transformation and a highly efficient two-stage optimization framework. Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the Frobenius norm or spectral norm is applied in the transformation. Extensive experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world datasets, exhibit that our solutions, especially HOPE+, are superior to existing methods in terms of result quality, while being up to orders of magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is able to produce clusters with the highest clustering accuracy within 31 minutes, which is unmatched by any existing solution for k-BGC.